001    /*
002     * CrochemoreLandauZivUkelson.java
003     *
004     * Copyright 2003 Sergio Anibal de Carvalho Junior
005     *
006     * This file is part of NeoBio.
007     *
008     * NeoBio is free software; you can redistribute it and/or modify it under the terms of
009     * the GNU General Public License as published by the Free Software Foundation; either
010     * version 2 of the License, or (at your option) any later version.
011     *
012     * NeoBio is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
013     * without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
014     * PURPOSE. See the GNU General Public License for more details.
015     *
016     * You should have received a copy of the GNU General Public License along with NeoBio;
017     * if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330,
018     * Boston, MA 02111-1307, USA.
019     *
020     * Proper attribution of the author as the source of the software would be appreciated.
021     *
022     * Sergio Anibal de Carvalho Junior             mailto:sergioanibaljr@users.sourceforge.net
023     * Department of Computer Science               http://www.dcs.kcl.ac.uk
024     * King's College London, UK                    http://www.kcl.ac.uk
025     *
026     * Please visit http://neobio.sourceforge.net
027     *
028     * This project was supervised by Professor Maxime Crochemore.
029     *
030     */
031    
032    package neobio.alignment;
033    
034    import java.io.Reader;
035    import java.io.IOException;
036    
037    /**
038     * This abstract class is the superclass of both global and local sequence alignment
039     * algorithms (with linear gap penalty function) due to Maxime Crochemore, Gad Landau and
040     * Michal Ziv-Ukelson (2002).
041     *
042     * <P>This implementation derives from the paper of M.Crochemore, G.Landau and
043     * M.Ziv-Ukelson, <I>A Sub-quadratic Sequence Alignment Algorithm for Unrestricted Scoring
044     * Matrices</I> (available here as
045     * <A HREF="doc-files/Crochemore_Landau_Ziv-Ukelson_algorithm.pdf">PDF</A> or
046     * <A HREF="doc-files/Crochemore_Landau_Ziv-Ukelson_algorithm.pdf">Postscript</A>).</P>
047     *
048     * <P>It employs Lempel-Ziv compression techniques to speed up the classic dynamic
049     * programmin approach to sequence alignment (see {@linkplain NeedlemanWunsch} and
050     * {@linkplain SmithWaterman} classes). It reduces the time and space complexity from
051     * O(n<SUP>2</SUP>) to O(n<SUP>2</SUP>/log n). In fact, the complexity is actually O(h
052     * n<SUP>2</SUP>/log n), where 0 <= h <= 1 is a real number denoting the entropy of the
053     * text (a measure of how compressible it is). This means that, the more compressible a
054     * sequence is, the less memory the algorithm will require, and the faster it will
055     * run.</P>
056     *
057     * <P>The idea behind this improvement is to identify repetitions in the sequences and
058     * reuse the computation of their alignments. The first step is, therefore, to parse the
059     * sequences into LZ78-like factors. LZ78 is a popular compression algorithm of the
060     * Lempel-Ziv familiy due to J.Ziv and A.Lempel (1978). This factorisation is accomplished
061     * by the {@linkplain FactorSequence} class (for more information about this
062     * factorisation, please refer to the specification of that class) that builds a
063     * doubly-linked list of factors. Each factor is an instance of the {@linkplain Factor}
064     * class (refer to the specification of this class for more information).</P>
065     *
066     * <P>Once the sequences have been parsed, the algorithm builds a matrix of blocks, called
067     * block table, that is vaguely similar to the dynamic programming matrix used by both
068     * <CODE>NeedlemanWunsch</CODE> and <CODE>SmithWaterman</CODE>. Each block contains an
069     * instance of an {@linkplain AlignmentBlock} (please refer to its specification for more
070     * information on what information is stored) and represents the alignment beteween one
071     * factor of each sequence. This block table is, in fact, a partition of the alignment
072     * graph.</P>
073     *
074     * <P>Consider a block B which corresponds to the alignment of factor F1 = xa from
075     * sequence S1 and factor F2 = yb from sequence S2. Here, F1 extends a previous factor of
076     * S1 with character a, while F2 extends a previous factor of S2 with character b. We can
077     * define the input border of B as the set of values at the left and top borders of block
078     * B, and the output border as the set of values at the right and bottom borders of B.
079     * Moreover, we can define the following prefix blocks of B:</P>
080     *
081     * <UL>
082     * <LI><B>Left prefix</B> - is the block that contains the alignment of factor F1 with
083     * a factor F2' = y (a prefix of factor F2).
084     * <LI><B>Diagonal prefix</B> - is the block that contains the alignment of factor F1' = x
085     * (a prefix of factor F1) with a factor F2' = y (a prefix of factor F2).
086     * <LI><B>Top prefix</B> - is the block that contains the alignment of factor F1' = x (a
087     * prefix of factor F1) with factor F2.
088     * </UL>
089     *
090     * <P>Note that each factor has a pointer to its prefix factor, called ancestor (see the
091     * specification of the <CODE>Factor</CODE> class). This pointer makes it easy to retrieve
092     * any of the three prefix blocks of B in constant time.</P>
093     *
094     * <P>Rather than computing each value of the alignment block B, the algorithm will only
095     * compute the values on its input and output borders. This is precisely what makes it
096     * more efficient.</P>
097     *
098     * <P>In this class there is a general specification of how the block table is computed
099     * (see the {@link #computeBlockTable computeBlockTable} method for details). The actual
100     * method depends on the subclasses. In general, there are two phases:</P>
101     *
102     * <UL>
103     * <LI><B>Encoding</B> - the structure of a block B is studied and represented in an
104     * efficient way by computing weights of optimal paths connecting each entry of its input
105     * border to each entry of its output border. This information is encoded in a DIST matrix
106     * where DIST[i,j] stores the weight of an optimal paths connecting entry i of the input
107     * border to entry j the output border of B.
108     * <LI><B>Propagation</B> - the input border of a block B is retrieved (from the left and
109     * top blocks of B) and its output border is computed with the help of the DIST matrix.
110     * </UL>
111     *
112     * <P>In fact, for each block, only one column of the DIST matrix needs to be computed,
113     * all other columns are actually retrieved from its prefix blocks. This is precisely what
114     * is accomplished by the {@link #assembleDistMatrix assembleDistMatrix} method found in
115     * this class (it is general enough for both global and local alignment versions of the
116     * algorithm.</P>
117     *
118     * <P>From the DIST matrix, we obtain the OUT matrix defined as
119     * <CODE>OUT[i,j] = I[i] + DIST[i,j]</CODE> where I is the input border array. This means
120     * that the OUT matrix is the DIST matrix updated by the input border of a block. The
121     * output border is then computed from the OUT matrix by taking the maximum value of each
122     * column. This class also have a general method for assembling the input border (see
123     * {@link #assembleInputBorder assembleInputBorder}</P>
124     *
125     * <P>The OUT matrix is encoded by the {@linkplain OutMatrix} class that takes as
126     * both a DIST matrix and an input border array. Note that <B>it does not compute the OUT
127     * matrix</B>, it just stores the necessary information to retrieve a value at any
128     * position of the matrix.</P>
129     *
130     * <P>A naive approach to compute the output border of a block from the OUT matrix of size
131     * n x n would take a time proportional to n<SUP>2</SUP>. However, it happens that, due to
132     * the nature of the DIST matrix, both DIST and OUT matrices are Monge arrays, which
133     * implies that they are also <I>totally monotone</I>. This property allows the
134     * computation of the output border of B in linear time with the SMAWK algorithm (see the
135     * specification of the {@linkplain Smawk} class for more information on SMAWK).</P>
136     *
137     * <P>This class contains a general specification that is pertinent to both global and
138     * local versions of the algorithm. For more information on each version of, please refer
139     * to the appropriate subclass.</P>
140     *
141     * <P><B>A note about the performance of these algorithms.</B> Although theoretical
142     * results suggest that these algorithms are faster and consume less memory than the
143     * classical methods, in practice it is hard to realise their performance gains.
144     *
145     * <P>These algorithms are extremely complex and require the storage of many extra
146     * pointers and other auxiliary data for each block (see the <CODE>AlignmentBlock</CODE>
147     * class for more details). Hence, even though the space requirement is
148     * O(n<SUP>2</SUP>/log n), which is less than O(n<SUP>2</SUP>), in practice, for most of
149     * the cases these algorithms will take more time and memory space than their clasical
150     * counterparts (we have to keep in mind that the Big Oh notation ignores all constants
151     * involved).</P>
152     *
153     * <P>Therefore, in order to realise the full power of these algorithms, they have to be
154     * used with extremly large and redundant sequences. This will allow a proper
155     * reutilisation of the computations and, maybe, provide an improvement in terms of space
156     * and run time. For instance, it is easy to devise such a sequence if we use a
157     * one-character alphabet because, in this case, a sequence is factorised into a series
158     * of factors that are a prefix of the next.</P>
159     *
160     * @author Sergio A. de Carvalho Jr.
161     * @see CrochemoreLandauZivUkelsonGlobalAlignment
162     * @see CrochemoreLandauZivUkelsonLocalAlignment
163     * @see NeedlemanWunsch
164     * @see SmithWaterman
165     * @see FactorSequence
166     * @see AlignmentBlock
167     * @see OutMatrix
168     * @see Smawk
169     * @see #computeBlockTable
170     * @see #assembleDistMatrix
171     */
172    public abstract class CrochemoreLandauZivUkelson extends PairwiseAlignmentAlgorithm
173    {
174            /**
175             * A constant that indicates that the source of an optimal path has been reached in a
176             * block and that the trace back procedure to retrieve a high scoring alignment can
177             * stop.
178             */
179            protected static final byte STOP_DIRECTION = 0;
180    
181            /**
182             * A constant that indicates that the left direction must be followed to reach the
183             * source of an optimal path in a block during the trace back procedure to retrieve a
184             * high scoring alignment.
185             */
186            protected static final byte LEFT_DIRECTION = 1;
187    
188            /**
189             * A constant that indicates that the diagonal direction must be followed to reach the
190             * source of an optimal path in a block during the trace back procedure to retrieve a
191             * high scoring alignment.
192             */
193            protected static final byte DIAGONAL_DIRECTION = 2;
194    
195            /**
196             * A constant that indicates that the top direction must be followed to reach the
197             * source of an optimal path in a block during the trace back procedure to retrieve a
198             * high scoring alignment.
199             */
200            protected static final byte TOP_DIRECTION = 3;
201    
202            /**
203             * The first factorised sequence being aligned.
204             */
205            protected FactorSequence seq1;
206    
207            /**
208             * The second factorised sequence being aligned.
209             */
210            protected FactorSequence seq2;
211    
212            /**
213             * The block table, which is a matrix of alignment blocks where each block represents
214             * the alignment between one factor of each sequence.
215             */
216            protected AlignmentBlock[][] block_table;
217    
218            /**
219             * Number of rows of the block table. It is determined by the number of factors of the
220             * first sequence.
221             */
222            protected int num_rows;
223    
224            /**
225             * Number of columns of the block table. It is determined by the number of factors of
226             * the second sequence.
227             */
228            protected int num_cols;
229    
230            /**
231             * An instance of the <CODE>Smawk</CODE> class that implements the SMAWK algorithm to
232             * compute the column maxima of a totally monotone matrix. It is used to speed up the
233             * computation of the output border of a block.
234             */
235            protected Smawk smawk = new Smawk();
236    
237            /**
238             * An instance of the <CODE>OutMatrix</CODE> class that encodes the OUT matrix of a
239             * block when supplied with the DIST matrix and the input border array of a block.
240             * Note that it does not compute the OUT matrix itselft, it just stores the necessary
241             * information to retrieve a value at any position of the matrix.
242             *
243             * <P>This object is then used to compute the output border of a block with the
244             * <CODE>Smawk</CODE> class. Note that the <CODE>OutMatrix</CODE> class implements the
245             * <CODE>Matrix</CODE> interface as required by the <CODE>Smawk</CODE> class.
246             *
247             * @see Matrix
248             * @see Smawk
249             */
250            protected OutMatrix out_matrix = new OutMatrix ();
251    
252            /**
253             * Loads sequences into <CODE>FactorSequence</CODE> instances. In case of any error,
254             * an exception is raised by the constructor of <CODE>FactorSequence</CODE> (please
255             * check the specification of that class for specific requirements).
256             *
257             * <P>A <CODE>FactorSequence</CODE> is an LZ78-like factorisation of the sequences
258             * being aligned.
259             *
260             * @param input1 Input for first sequence
261             * @param input2 Input for second sequence
262             * @throws IOException If an I/O error occurs when reading the sequences
263             * @throws InvalidSequenceException If the sequences are not valid
264             * @see FactorSequence
265             */
266            protected void loadSequencesInternal (Reader input1, Reader input2)
267                    throws IOException, InvalidSequenceException
268            {
269                    // load sequences into instances of CharSequence
270                    this.seq1 = new FactorSequence(input1);
271                    this.seq2 = new FactorSequence(input2);
272    
273                    // determine the block table's dimensions
274                    this.num_rows = seq1.numFactors();
275                    this.num_cols = seq2.numFactors();
276            }
277    
278            /**
279             * Frees pointers to loaded sequences and the the block table so that their data can
280             * be garbage collected.
281             */
282            protected void unloadSequencesInternal ()
283            {
284                    this.seq1 = null;
285                    this.seq2 = null;
286                    this.block_table = null;
287            }
288    
289            /**
290             * Computes the block table (the result depends on subclasses, see
291             * <CODE>computeBlockTable</CODE> for details) and requests subclasses to retrieve an
292             * optimal alignment between the loaded sequences. The actual product depends on the
293             * subclasses which can produce a global (see
294             * <CODE>CrochemoreLandauZivUkelsonGlobalAlignment</CODE>) or local alignment (see
295             * <CODE>CrochemoreLandauZivUkelsonLocalAlignment</CODE>).
296             *
297             * <P>Subclasses are required to implement the <CODE>buildOptimalAlignment</CODE>
298             * abstract method defined by this class according to their own method.</P>
299             *
300             * @return an optimal alignment between the loaded sequences
301             * @throws IncompatibleScoringSchemeException If the scoring scheme is not compatible
302             * with the loaded sequences.
303             * @see CrochemoreLandauZivUkelsonGlobalAlignment
304             * @see CrochemoreLandauZivUkelsonLocalAlignment
305             * @see #computeBlockTable
306             * @see #buildOptimalAlignment
307             */
308            protected PairwiseAlignment computePairwiseAlignment ()
309                    throws IncompatibleScoringSchemeException
310            {
311                    // compute block table
312                    computeBlockTable ();
313    
314                    // build and return an optimal global alignment
315                    PairwiseAlignment alignment = buildOptimalAlignment ();
316    
317                    // allow the block table to be garbage collected
318                    block_table = null;
319    
320                    return alignment;
321            }
322    
323            /**
324             * Computes the block table (the result depends on subclasses, see
325             * <CODE>computeBlockTable</CODE> for details) and requests subclasses to locate the
326             * score of the highest scoring alignment between the two sequences in the block
327             * table. The result depends on the subclasses, and either a global alignment
328             * (see <CODE>CrochemoreLandauZivUkelsonGlobalAlignment</CODE>) or local alignment
329             * score (see <CODE>CrochemoreLandauZivUkelsonLocalAlignment</CODE>) will be produced.
330             *
331             * <P>Subclasses are required to implement the <CODE>locateScore</CODE> abstract
332             * method defined by this class according to their own method.</P>
333             *
334             * <P>Note that this method calculates the similarity value only (it doesn't trace
335             * back into the block table to retrieve the alignment itself).</P>
336             *
337             * @return the score of the highest scoring alignment between the loaded sequences
338             * @throws IncompatibleScoringSchemeException If the scoring scheme is not compatible
339             * with the loaded sequences.
340             * @see CrochemoreLandauZivUkelsonGlobalAlignment
341             * @see CrochemoreLandauZivUkelsonLocalAlignment
342             * @see #locateScore
343             */
344            protected int computeScore () throws IncompatibleScoringSchemeException
345            {
346                    // compute block table
347                    computeBlockTable ();
348    
349                    // get score
350                    int     score = locateScore ();
351    
352                    // allow the block table to be garbage collected
353                    block_table = null;
354    
355                    return score;
356            }
357    
358            /**
359             * Computes the block table. This method is a general specification of how the block
360             * table should be computed. It creates the block table according to the number of
361             * factors of each sequence. It then goes over each position of the block table,
362             * retrieves the corresponding factors from each sequence, and repasses the
363             * information to the subclasses that will do the actual computation of each block
364             * using the scoring scheme previously set.
365             *
366             * <P>There are four different cases that defines four abstract methods in this class,
367             * which subclasses must implement:</P>
368             *
369             * <UL>
370             * <LI><B>createRootBlock</B> - creates and computes block at row 0, column 0;
371             * <LI><B>createFirstColumnBlock</B> - creates and computes blocks at column 0 which
372             * corresponds to alignment blocks between factors of sequence 1 and an empty string;
373             * <LI><B>createFirstRowBlock</B> - creates and computes blocks at row 0 which
374             * corresponds to alignment blocks between factors of sequence 2 and an empty string;
375             * <LI><B>createBlock</B> - creates and computes blocks at row > 0 and column > 0
376             * which corresponds to alignment blocks between one factor of sequence 1 and one
377             * factor of sequence 2;
378             * </UL>
379             *
380             * <P>Note that each factor has a serial number which indicates its order in the list
381             * of factors of a sequence. This number will match with the row and column index of
382             * a block in the block table. For instance, if a block has factors F1 and F2 with
383             * serial numbers 12 and 53, respectively, this means that this block is found at row
384             * 12, column 53 of the block table.</P>
385             *
386             * @throws IncompatibleScoringSchemeException If the scoring scheme is not compatible
387             * with the loaded sequences.
388             * @see #createRootBlock
389             * @see #createFirstColumnBlock
390             * @see #createFirstRowBlock
391             * @see #createBlock
392             */
393            protected void computeBlockTable () throws IncompatibleScoringSchemeException
394            {
395                    Factor  factor1, factor2;
396                    int             r, c, max_length;
397    
398                    // create block table
399                    block_table = new AlignmentBlock[num_rows][num_cols];
400    
401                    // find the length of the longest sequence (number of characters)
402                    max_length = Math.max(seq1.numChars(), seq2.numChars());
403    
404                    // prepares the OUT matrix object
405                    out_matrix.init (max_length, scoring.maxAbsoluteScore());
406    
407                    // start at the root of each trie
408                    factor1 = seq1.getRootFactor();
409                    factor2 = seq2.getRootFactor();
410    
411                    // check if roots' indexes are both zero
412                    if (factor1.getSerialNumber() != 0 || factor2.getSerialNumber() != 0)
413                            throw new IndexOutOfBoundsException ("Unexpected factor index.");
414    
415                    // initiate first cell of block table
416                    block_table[0][0] = createRootBlock (factor1, factor2);
417    
418                    // compute first row
419                    for (c = 1; c < num_cols; c++)
420                    {
421                            factor2 = factor2.getNext();
422    
423                            // check if factor2's index equals its column number (except for
424                            // the last factor that can be a repetition of a previous one)
425                            if (c < num_cols - 1 && factor2.getSerialNumber() != c)
426                                    throw new IndexOutOfBoundsException ("Unexpected factor index.");
427    
428                            block_table[0][c] = createFirstRowBlock (factor1, factor2, c);
429                    }
430    
431                    for (r = 1; r < num_rows; r++)
432                    {
433                            factor1 = factor1.getNext();
434    
435                            // check if factor1's index equals its row number (except for
436                            // the last factor that can be a repetition of a previous one)
437                            if (r < num_rows - 1 && factor1.getSerialNumber() != r)
438                                    throw new IndexOutOfBoundsException ("Unexpected factor index.");
439    
440                            // go back to the root of sequence 2
441                            factor2 = seq2.getRootFactor();
442    
443                            if (factor2.getSerialNumber() != 0)
444                                    throw new IndexOutOfBoundsException ("Unexpected factor index.");
445    
446                            // compute first column of current row
447                            block_table[r][0] = createFirstColumnBlock (factor1, factor2, r);
448    
449                            for (c = 1; c < num_cols; c++)
450                            {
451                                    factor2 = factor2.getNext();
452    
453                                    // check if factor2's index equals its column number (except for
454                                    // the last factor that can be a repetition of a previous one)
455                                    if (c < num_cols - 1 && factor2.getSerialNumber() != c)
456                                            throw new IndexOutOfBoundsException ("Unexpected factor index.");
457    
458                                    // compute row r, col c
459                                    block_table[r][c] = createBlock (factor1, factor2, r, c);
460                            }
461                    }
462            }
463    
464            /**
465             * Assembles the DIST matrix of a block by retrieving the DIST columns of its prefix
466             * blocks. In fact, it also stores pointers to the owner block for each column
467             * retrieved. These pointers are later used during the trace back procedure that
468             * builds an optimal alignment from the information computed in the block table. This
469             * method is general enough to suit both global and local alignment versions of the
470             * algorithm.
471             *
472             * @param block the block for which the DIST matrix is needed
473             * @param dim the dimension of the DIST matrix
474             * @param r the row index of this block in the block table
475             * @param c the column index of this block in the block table
476             * @param lc the number of columns of the alignment block
477             * @return the DIST matrix
478             */
479            protected int[][] assembleDistMatrix (AlignmentBlock block, int dim, int r, int c,
480                    int lc)
481            {
482                    AlignmentBlock  ancestor;
483                    Factor                  parent;
484                    int[][]                 dist;
485                    int                             i;
486    
487                    dist = new int[dim][];
488    
489                    // columns to the left of lc
490                    parent = block.factor2.getAncestor();
491                    for (i = lc - 1; i >= 0; i--)
492                    {
493                            ancestor = block_table[r][parent.getSerialNumber()];
494                            block.ancestor[i] = ancestor;
495                            dist[i] = ancestor.dist_column;
496                            parent = parent.getAncestor();
497                    }
498    
499                    // column lc
500                    dist[lc] = block.dist_column;
501                    block.ancestor[lc] = block;
502    
503                    // columns to the right of lc
504                    parent = block.factor1.getAncestor();
505                    for (i = lc + 1; i < dim; i++)
506                    {
507                            ancestor = block_table[parent.getSerialNumber()][c];
508                            block.ancestor[i] = ancestor;
509                            dist[i] = ancestor.dist_column;
510                            parent = parent.getAncestor();
511                    }
512    
513                    return dist;
514            }
515    
516            /**
517             * Assembles the input border of a block by retrieving the values at the output
518             * borders of the left and top blocks. This method is general enough to suit both
519             * global and local alignment versions of the algorithm. Note that it can be used to
520             * assemble the input border of any block but the root one (it will cause an
521             * <CODE>ArrayIndexOutOfBoundsException</CODE>.
522             *
523             * @param dim dimension of the input border
524             * @param r row index of the block in the block table
525             * @param c column index of the block in the block table
526             * @param lr number of row of the block
527             * @return the block's input border
528             */
529            protected int[] assembleInputBorder (int dim, int r, int c, int lr)
530            {
531                    AlignmentBlock  left = null, top = null;
532                    int[]   input;
533                    int             i;
534    
535                    input = new int [dim];
536    
537                    // set up pointers to the left and top blocks (if applicable)
538                    if (c > 0) left = block_table[r][c-1];
539                    if (r > 0) top  = block_table[r-1][c];
540    
541                    for (i = 0; i < dim; i++)
542                    {
543                            if (i < lr)
544                            {
545                                    if (left != null)
546                                            input[i] = left.output_border[left.factor2.length() + i];
547                                    else
548                                            // there is no block to the left, so set a big negative value
549                                            // to make sure it will not be used (unfortunately, MIN_VALUE
550                                            // can overflow to a positive value when substracted by any
551                                            // number, so we use half of it as a workaround)
552                                            input[i] = Integer.MIN_VALUE / 2;
553                            }
554                            else if (i == lr)
555                            {
556                                    if (left != null)
557                                            input[i] = left.output_border[left.factor2.length() + i];
558                                    else
559                                            // no need to check if top is not null
560                                            // (because we assume this is not the root block)
561                                            input[i] = top.output_border[i - lr];
562                            }
563                            else
564                            {
565                                    if (top != null)
566                                            input[i] = top.output_border[i - lr];
567                                    else
568                                            // there is no top block (see note for the left case)
569                                            input[i] = Integer.MIN_VALUE / 2;
570                            }
571                    }
572    
573                    return input;
574            }
575    
576            /**
577             * Traverses a block to retrieve a part of an optimal alignment from the specified
578             * source in the output border to an entry in the input border.
579             *
580             * @param block the block to be traversed
581             * @param source the source of the path in the output border
582             * @param gapped_seq1 the StringBuffer to where the gapped sequence 1 is written to
583             * @param tag_line the StringBuffer to where the tag_line is written to
584             * @param gapped_seq2 the StringBuffer to where the gapped sequence 2 is written to
585             * @throws IncompatibleScoringSchemeException If the scoring scheme is not compatible
586             * with the loaded sequences.
587             */
588            protected void traverseBlock (AlignmentBlock block, int source,
589                    StringBuffer gapped_seq1, StringBuffer tag_line, StringBuffer gapped_seq2)
590                    throws IncompatibleScoringSchemeException
591            {
592                    char char1, char2;
593    
594                    while (block.direction[source] != STOP_DIRECTION)
595                    {
596                            char1 = block.factor1.getNewChar();
597                            char2 = block.factor2.getNewChar();
598    
599                            switch (block.direction[source])
600                            {
601                                    case LEFT_DIRECTION:
602                                            gapped_seq1.insert (0, GAP_CHARACTER);
603                                            tag_line.insert (0, GAP_TAG);
604                                            gapped_seq2.insert (0, char2);
605    
606                                            block = getLeftPrefix (block);
607                                            break;
608    
609                                    case DIAGONAL_DIRECTION:
610                                            gapped_seq1.insert (0, char1);
611                                            if (char1 == char2)
612                                                    if (useMatchTag())
613                                                            tag_line.insert (0, MATCH_TAG);
614                                                    else
615                                                            tag_line.insert (0, char1);
616                                            else if (scoreSubstitution(char1, char2) > 0)
617                                                    tag_line.insert (0, APPROXIMATE_MATCH_TAG);
618                                            else
619                                                    tag_line.insert (0, MISMATCH_TAG);
620                                            gapped_seq2.insert (0, char2);
621    
622                                            block = getDiagonalPrefix (block);
623                                            source --;
624                                            break;
625    
626                                    case TOP_DIRECTION:
627                                            gapped_seq1.insert (0, char1);
628                                            tag_line.insert (0, GAP_TAG);
629                                            gapped_seq2.insert (0, GAP_CHARACTER);
630    
631                                            block = getTopPrefix (block);
632                                            source --;
633                                            break;
634                            }
635                    }
636            }
637    
638            /**
639             * This method is a shorthand to retrieve the left prefix of a block from the block
640             * table.
641             *
642             * @param block the block
643             * @return the block's left prefix
644             */
645            protected AlignmentBlock getLeftPrefix (AlignmentBlock block)
646            {
647                    int prefix_row = block.factor1.getSerialNumber();
648                    int prefix_col = block.factor2.getAncestorSerialNumber();
649    
650                    return block_table[prefix_row][prefix_col];
651            }
652    
653            /**
654             * This method is a shorthand to retrieve the diagonal prefix of a block from the
655             * block table.
656             *
657             * @param block the block
658             * @return the block's diagonal prefix
659             */
660            protected AlignmentBlock getDiagonalPrefix (AlignmentBlock block)
661            {
662                    int prefix_row = block.factor1.getAncestorSerialNumber();
663                    int prefix_col = block.factor2.getAncestorSerialNumber();
664    
665                    return block_table[prefix_row][prefix_col];
666            }
667    
668            /**
669             * This method is a shorthand to retrieve the top prefix of a block from the block
670             * table.
671             *
672             * @param block the block
673             * @return the block's top prefix
674             */
675            protected AlignmentBlock getTopPrefix (AlignmentBlock block)
676            {
677                    int prefix_row = block.factor1.getAncestorSerialNumber();
678                    int prefix_col = block.factor2.getSerialNumber();
679    
680                    return block_table[prefix_row][prefix_col];
681            }
682    
683            /**
684             * Computes the root block of the block table. See subclasses for actual
685             * implementation.
686             *
687             * @param factor1 the factor of the first sequence being aligned
688             * @param factor2 the factor of the second sequence being aligned
689             * @return the root block
690             * @throws IncompatibleScoringSchemeException If the scoring scheme is not compatible
691             * with the loaded sequences.
692             */
693            protected abstract AlignmentBlock createRootBlock (Factor factor1, Factor factor2)
694                    throws IncompatibleScoringSchemeException;
695    
696            /**
697             * Computes a block at the first row (row zero) of the block table, which corresponds
698             * to an alignment block between one factor of sequence 2 and an empty string. See
699             * subclasses for actual implementation.
700             *
701             * @param factor1 the factor of the first sequence being aligned
702             * @param factor2 the factor of the second sequence being aligned
703             * @param col the column index of block in the block table
704             * @return the computed block
705             * @throws IncompatibleScoringSchemeException If the scoring scheme is not compatible
706             * with the loaded sequences.
707             */
708            protected abstract AlignmentBlock createFirstRowBlock (Factor factor1, Factor factor2,
709                    int col) throws IncompatibleScoringSchemeException;
710    
711            /**
712             * Computes a block at the first column (column zero) of the block table, which
713             * corresponds to an alignment block between one factor of sequence 1 and an empty
714             * string. See subclasses for actual implementation.
715             *
716             * @param factor1 the factor of the first sequence being aligned
717             * @param factor2 the factor of the second sequence being aligned
718             * @param row the row index of block in the block table
719             * @return the computed block
720             * @throws IncompatibleScoringSchemeException If the scoring scheme is not compatible
721             * with the loaded sequences.
722             */
723            protected abstract AlignmentBlock createFirstColumnBlock (Factor factor1,
724                    Factor factor2, int row) throws IncompatibleScoringSchemeException;
725    
726            /**
727             * Computes a block of the block table, which corresponds to an alignment block
728             * between one factor of sequence 1 and one factor of sequence 2. See subclasses for
729             * actual implementation.
730             *
731             * @param factor1 the factor of the first sequence being aligned
732             * @param factor2 the factor of the second sequence being aligned
733             * @param row the row index of block in the block table
734             * @param col the column index of block in the block table
735             * @return the computed block
736             * @throws IncompatibleScoringSchemeException If the scoring scheme is not compatible
737             * with the loaded sequences.
738             */
739            protected abstract AlignmentBlock createBlock (Factor factor1, Factor factor2,
740                    int row, int col) throws IncompatibleScoringSchemeException;
741    
742            /**
743             * Retrieves an optimal alignment between the loaded sequences. See subclasses for
744             * actual implementation.
745             *
746             * @return the computed block
747             * @throws IncompatibleScoringSchemeException If the scoring scheme is not compatible
748             * with the loaded sequences.
749             */
750            protected abstract PairwiseAlignment buildOptimalAlignment ()
751                    throws IncompatibleScoringSchemeException;
752    
753            /**
754             * Locates the score of the highest scoring alignment between the two sequences in the
755             * block table after is thas been computed. See subclasses for actual implementation.
756             *
757             * @return the score of the highest scoring alignment between the loaded sequences
758             * @throws IncompatibleScoringSchemeException If the scoring scheme is not compatible
759             * with the loaded sequences.
760             */
761            protected abstract int locateScore ();
762    }